9 research outputs found
Study of Distractors in Neural Models of Code
Finding important features that contribute to the prediction of neural models
is an active area of research in explainable AI. Neural models are opaque and
finding such features sheds light on a better understanding of their
predictions. In contrast, in this work, we present an inverse perspective of
distractor features: features that cast doubt about the prediction by affecting
the model's confidence in its prediction. Understanding distractors provide a
complementary view of the features' relevance in the predictions of neural
models. In this paper, we apply a reduction-based technique to find distractors
and provide our preliminary results of their impacts and types. Our experiments
across various tasks, models, and datasets of code reveal that the removal of
tokens can have a significant impact on the confidence of models in their
predictions and the categories of tokens can also play a vital role in the
model's confidence. Our study aims to enhance the transparency of models by
emphasizing those tokens that significantly influence the confidence of the
models.Comment: The 1st International Workshop on Interpretability and Robustness in
Neural Software Engineering, Co-located with ICSE (InteNSE'23
Towards non-intrusive software introspection and beyond
Continuous verification and security analysis of software systems are of paramount importance to many organizations. The state-of-the-art for such operations implements agent-based approaches to inspect the provisioned software stack for security and compliance issues. However, this approach, which runs agents on the systems being analyzed, is vulnerable to some attacks, can incur substantial performance impact, and can introduce significant complexity. In this paper, we present the design and prototype implementation of a general-purpose approach for Non-intrusive Software Introspection (NSI). By adhering to NSI, organizations hosting in the cloud can as well control the software introspection workflow with reduced trust in the provider. Experimental analysis of real-world applications demonstrates that NSI presents a lightweight and scalable approach, and has a negligible impact on the performance of applications running on the instance being introspected.Accepted manuscrip
Non-intrusive Virtual Systems Monitoring
In this thesis, I discuss why existing intrusive systems monitoring approaches are not a good fit for the modern virtualized cloud, and describe two alternative out-of-band solutions that leverage virtualization for better systems monitoring.
My first solution employs Virtual Machine Introspection (VMI) to gain access to a VM's runtime state from the virtualization layer. I develop new VMI techniques to efficiently expose VM memory state from outside the VM boundary, which can be readily employed in existing cloud platforms as they are designed to operate with no new modifications or dependencies. While there exist a variety of other competing alternatives, their latency, overhead, complexity and consistency trade-offs are not clear. Thus, I begin my thesis with addressing this gap by organizing the various existing VMI techniques into a taxonomy based upon their operational principles, and performing a thorough exploration of their trade-offs both qualitatively and quantitatively. I further present a deep dive on VMI consistency aspects to understand the sources of inconsistency in observed VM state, and show marginal benefits for consistency with commonly employed VMI solutions despite their prohibitive overheads.
Then, I present NFM (Near Field Monitoring)- a new approach that decouples system execution from monitoring by pushing monitoring components out of the target systems' scope. By extending and combining VMI with a backend cloud analytics platform, NFM provides simple, standard interfaces to monitor running systems in the cloud that require no guest cooperation or modification, and have minimal effect on guest execution. By decoupling monitoring and analytics from target system context, NFM provides always-on monitoring, even when the target system is unresponsive.
My second solution- CIVIC (Cloning and Injection based VM Inspection and Customization)- avoids NFM's functionality duplication effort and overcomes its VMI-related limitations arising out of its raw memory byte level visibility into the guest. CIVIC operates at a logical OS level and reuses the vast stock monitoring software codebase, but in a separate isolated environment thus avoiding guest intrusion and interference hassles. CIVIC enables a broader usage scope in addition to NFM's passive (read-only) monitoring, by supporting actuation or on-the-fly introduction of new functionality. It restricts all impact and side-effects of such customization operations inside a live clone of the guest VM. New functionality over the replicated VM state is introduced using code injection.
I present four applications built on top of NFM using its `systems as data' monitoring approach, to showcase its capabilities for across-systems and across-time analytics.
I also highlight CIVIC's versatility in terms of enabling hotplugged and impact-free live customization, by employing it to monitor, inspect, troubleshoot and tune unmodified VMs.Ph.D
Gamma Squeezing: An analysis of the market impact of institutional equity options market makers on the underlying market
With the increase in retail investors expressing opinions on companies through equity options during the peak of COVID-19 activity, as perhaps best illustrated by increasing activity on forums such as the Reddit community wallstreetbets, derivatives exchanges have increased the frequency with which equity options expire. For instance, Tesla options have weekly expiries, the Russell and Nasdaq indices have expiries on Mondays, Wednesdays, and Fridays, and the S&P has daily expiries. Consequently, it is of great interest to people who participate in equities markets to understand how option expiry affects equity price movement. This thesis aims to demonstrate that as a result of institutional options market makers hedging out their risk to the price of the underlying asset, equity prices with options either experience large moves or virtually no move in the last few hours of the afternoon on the day of expiry. One well-studied specific instance of this phenomenon is pinning, where on the day of expiry, the equity price tends to close very near some option strike price. Outside of being interesting for those who wish to participate in equities trading, such a result offers one possible refutation to the efficient market hypothesis, since such a price movement in equities is not reflective of any information or sentiment about the company, but rather an expression of the risk aversion of certain market participants in a correlated product
Automated Code generation for Information Technology Tasks in YAML through Large Language Models
The recent improvement in code generation capabilities due to the use of
large language models has mainly benefited general purpose programming
languages. Domain specific languages, such as the ones used for IT Automation,
have received far less attention, despite involving many active developers and
being an essential component of modern cloud platforms. This work focuses on
the generation of Ansible-YAML, a widely used markup language for IT
Automation. We present Ansible Wisdom, a natural-language to Ansible-YAML code
generation tool, aimed at improving IT automation productivity. Ansible Wisdom
is a transformer-based model, extended by training with a new dataset
containing Ansible-YAML. We also develop two novel performance metrics for YAML
and Ansible to capture the specific characteristics of this domain. Results
show that Ansible Wisdom can accurately generate Ansible script from natural
language prompts with performance comparable or better than existing state of
the art code generation models